Testing simplified protein models of the hPinl WW domain 
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The WW domain of the human Pint protein for its simple topology and the large amount of exper- 
imental data is an ideal candidate to assess theoretical approaches to protein folding. The purpose of 
the present work is to compare the reliability of the chemically-based Sorenson/Head-Gordon (SHG) 
model and a standard native centric model in reproducing through molecular dynamics simulations 
some of the well known features of the folding transition of this small domain. Our results show 
that the Go model correctly reproduces the cooperative, two-state, folding mechanism of the WW- 
domain, while the SHG model predicts a transition occurring in two stages: a collapse followed by 
a structural rearrangement. The lack of a cooperative folding in the SHG simulations appears to be 
related to the non-funnel shape of the energy landscape featuring a partitioning of the native valley 
in sub-basins corresponding to different chain chiralities. However the SHG approach remains more 
reliable in estimating the $-values with respect to Go-like description. This may suggest that the 
WW-domain folding process is stirred by energetic and topological factors as well, and it highlights 
the better suitability of chemically-based models in simulating mutations. 

PACS numbers; 



I. INTRODUCTION 

The WW domains are a family of fast-folding, com- 
pact, modular domains featuring a triple-stranded, anti- 
parallel beta-sheet owing their name to the presence of 
two highly conserved Triptophanes (W). Recent studies 
suggested that these domains may fold at a rate close 
to the speed limit for /3-sheet formation offering the op- 
portunity to investigate the pathways of /3-sheet kinetics. 
13 . In particular, the human Pinl protein WW domain, 
due to the availability of several structural thermo- 
dynamical and kinetic experimental data, represents 
an excellent target to test computational techniques and 
theoretical approaches. 

The structure of this domain, resolved through 
NMR Hi and X-ray diffraction ||| (Fig. [J), is character- 
ized by hydrophobic clusters providing the largest con- 
tribution to the thermodynamic stability f3|. 

Cluster 1 (CLl) involves residues Leu7, Trpll, Tyr24 
and Pro37, the second cluster (CL2) comprises Tyr23, 
Phe25 and Argl4. The stability of the molecule also de- 
rives from a network of hydrogen-bonds whose central 
element is the highly conserved Asn26 located on strand 
(32 and acting both as donor and acceptor in bonds with 
Pro9, Trpll, Ile28 and Thr29, thus linking strands /3i 
and (3^. The presence of two loops. Loop I (LI) plays 
a key role in substrate recognition [3] as it binds to the 
phosphate of the pS-P motif of the Proline-rich ligands. 
Loop II (L2), instead, gives an important contribution to 
thermal stability Thermal denaturation experiments 
and simplified statistical physics approaches have 
shown that the Pinl WW domain folds following cooper- 
ative two-state mechanism at the temperature Tm = 332 
K. The mutagenesis analysis performed by the same au- 



thors identified the mutations on Serl6, Serl8 and 
Serl9 in Loop I as maximally destabilizing for the tran- 
sition state, so that the formation of LI appears to be 
the rate-limiting step in the folding/unfolding process. 
Loop II (L2) is involved in the formation of the transi- 
tion state only at high temperatures . Due to the abil- 
ity of inducing conformational changes in Proline-rich, 
phosphorilated substrates, Pinl is a potential regulator 
of the cell-cycle, and maybe involved in pathologies like 
Liddle's syndrome, muscular dystrophy and Alzheimer's 
disease 00- 

The aim of this work is the comparison of two off- 
lattice protein descriptions: the Go- model Q which cus- 
tomary allows studying the influence of the native state 
structure on the folding process, and a model proposed 
by Sorenson and Head-Gordon 0, ^ (hereafter re- 
ferred to as SHG), mainly based on the primary and 
secondary structural information. The conceptual justi- 
fication of topology-based or native centric models relies 
on the observation that the topology of native states can 
play a crucial role in selecting some features of the fold- 
ing mechanism 0, 0, 113. ITa | . The main experimental 
finding supporting the above statement can be summa- 
rized as follows (Baker 0|): (a) the similarity shared by 
transition-state conformations and folding mechanisms 
of proteins having structurally related native states de- 
spite their low sequence homology 0, 0, 0| and (b) 
the correlation that certain simple topological proper- 
ties, such as contact order, may have with protein folding 
rates [23, |^ • The Go- force field is independent of the 
amino acid sequence, and it requires the knowledge of 
the tertiary structure of native states to identify native 
interactions. Accordingly, the native centric approach 
cannot be used for ab initio predictions of native folds. 
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even if recent works m El ii m m provide growing 

evidence that it can be confidently used for the charac- 
terization of transition states of real small, fast-folding 
(sub- millisecond) [23, |2^ proteins that are characterized 
by a low level of energetic frustration. However, topo- 
logical models might not correctly reproduce the folding 
process when chemical interactions play a relevant role. 
The SHG model, which is instead based on the chemical 
and physical properties of amino-acids such as hydropho- 
bicity, is in principle better applicable to proteins with 
a higher level of energetic frustration. Moreover, requir- 
ing the knowledge of primary and secondary structures 
only, the model has a greater predictive power, and in 
this sense, could be considered closer to an ab initio rep- 
resentation. The above arguments motivate a detailed 
comparison between the these two protein models to as- 
sess their applicability and potentialities in the study of 
biomolecules. 





FIG. 1: Backbone representation of the NMR structure of 
Pinl WW domain (pdb-id = INMV, Left) and simulated 
structure To (Right). Residues in the three /3-strands are 
coloured in blue, those belonging to loops LI and L2 in yellow. 
Side-chains of residues participating in CLl (Leu7, Trpll, 
Tyr24, Pro37) are shown in green stick representation, those 
involved in CL2 (Argl4, Tyr23, Phe25) in magenta. Figures 
were drawn with RASMOL. 



II. THEORY AND METHODS 

When native state topology plays the relevant role in 
driving the folding process, many molecular details of 
protein structures can be mapped onto simplified coarse- 
grained models encoding the overall topology through 
the knowledge of the native contacts. These models, ne- 
glecting side chains and peptide groups, reduce a protein 
chain to its backbone, where aminoacids are assimilated 
to beads centred on their a carbon atoms. The Go energy 
function, mimicking a perfect funnel landscape, assigns 
to the native state the lowest energy by simply promoting 
the formation of native interactions. Here we employ the 



force field proposed by Clementi et al. [22l| with distance 
cutoff Rc — 6.5 to to identify native contacts in the 

structure INMV.pdb. A native contact means that the 
distance, Rij, of Cq atoms relative to residues i and j 
(N ~ j\ ^ 3) is less than Rc in the native state. This pair 
undergoes an attractive LJ-interaction 
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with equilibrium distance Rij . When two residues are 



not in a native contact (i?y 
a repulsive potential 

^nnat iXij ) ~ 



> Rc) they interact through 
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with a = 4.5 A. These non-native interactions, besides 
ensuring the self-avoidance of the chain, generally en- 
hance the cooperativity of the overall folding process -S^l . 
A further bias towards the native secondary structure is 
introduced through a bending and a dihedral potential. 
The former is modeled as an harmonic function and al- 
lows only small oscillations around the native angles (6*°) 
formed by three consecutive residues 

with harmonic constant kg — 20e. The most important 
determinant of the secondary structure is the dihedral 
potential arising from the torsional energy. Each dihedral 
angle, identified by four consecutive beads, contributes to 
the potential with the terms 



V^) = - cos(0, - (/.O)] -f k^^\l ~ cos 3(4 - 4>'l)] 

where (jJf is the value of angle i in the native structure, 

/c^^^ — e and k^^^ — 0.5e. Finally, consecutive residues 
interact with each other through the potential harmonic 
in their distance r^.i+i 



(2) 



which maintains the chain connectivity, with bi being the 
native bond-length and kh — lOOO/rg. Therefore, the 
global Go-potential reads 

N-l N-2 N-3 

VTot = ^ l^h(rM+l)+ ^ 14(0,)+ ^ K^(0.) + 

i—1 2—1 2—1 

at{Tij)^ij + (1 ^ij)Vnnat{Tij)} ■ 

i,j>i+2 

where Ay — 1(0) if the contact is native (non-native). 
Go models of the type just outlined may produce a grad- 
ual folding behaviour being uncapable to reproduce the 
typical kinetic cooperativity of two-state folders. Exper- 
imental studies suggest |3l[ [S^l that the origin of coop- 
erativity lies in specific interactions appearing only after 
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the assembly of native-like structures. These particular 
interactions can be modeled by imparting an extra ener- 
getic global stabilization to the native state through 
a different analytical form of the energy function when 
the chain visits the native basin. In the present work 
we need to implement the rescaling method to make the 
folding transition highly cooperative in agreement with 
the experiments on the WW domain. The interaction 
forces on the residues are thus computed according to 
the following rule 



^conf — 



-V14 - P^iVTot - 14) 



for Q < Qth 
for Q > Qth 



(3) 



where Q is the fraction of formed native contacts and 
p = 2 is the scaling factor. The force rescaling deter- 
mines a higher free energy barrier between the folded 
and unfolded state in correspondence of the folding tem- 
perature resulting in a higher cooperativity. Therefore, 
the residence times in the folded and unfolded state are 
expected to be significantly longer. 

The SHG model is an off-lattice minimal model that 
generalizes a previous model introduced by Thirumalai 
and co-workers [s^ . ISSj . This approach represents a- 
carbons with beads of three possible flavours hydrophobic 
(B), hydrophilic (L) and neutral (N), according to table 

m 
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Ala 
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Met 


B 


Gly 


N 


Asn 


L 


Cys 


B 


Val 


B 


Ser 


N 


His 


L 


Leu 


B 


Trp 


B 


Thr 


L 


Gin 


L 


He 


B 


Tyr 


B 


Glu 


L 


Lys 


L 


Phe 


B 


Pro 


B 


Asp 


L 


Arg 


L 



TABLE L Dictionary for the translation of three-letter code 
of the 20 natural amino-acid into the three-flavor code [Tl|. 



The driving force responsible for the collapse onto a 
compact structure is the attraction between B-beads, 
whereas the repulsion between L and N beads deter- 
mines the rearrangements of the compact structure into 
the native topology. The long-range interactions between 
residues which may be far apart in sequence space is mod- 
elled through the potential: 



Vlr = ^ QiSi 



12 / \ 6' 



(4) 



where (1.65 Kcal mol~^ see below) sets the energy 
scale and a — 4.0 A. The attractive forces between hy- 
drophobic residues is attained by setting S*! = 5*2 = 1 for 
BB pairs, while the interactions involving the LL and LB 
pairs are characterized by 5*1 = 1/3 and 5*2 = —1. This 
interaction is repulsive and the term, which accounts 
for the hydration shell around the hydrophilic residues, 
makes the potential longer ranged than the usual r~^^. 



The forces involving neutral residues are also repulsive 
and amount to an excluded volume potential by setting 
5i = and 5*2 = 0. The secondary structure arises as 
a result of bending and dihedral interactions, which sur- 
rogate side-chain packing and hydrogen-bonding. The 
analytic expression of the dihedral potential is 

Vdih = + cos0i) + Bi{l - cos0i) + 

Ci{l + cos30i) + A(l + cos{(j)^ + 7r/4))] (5) 

where indicates the angle between the two adja- 
cent planes identified by the positions of four consecutive 
beads. The information on secondary structures is sys- 
tematically stored in the coefficients A, B, C and D that 
determine a bias on the angles reflecting the propensity 
of residues to form a specific secondary motif. Indeed, 
each dihedral, in the chain, is defined to be either he- 
lical (H: A, = 0, B, = ^ Di = l.2eh), extended 
(E: A, = 0.96,,, a = 1.2eft,S, = A = 0), or turn (T: 
A, = Bi = A = 0,Q = 0.2eh). Therefore, the pri- 
mary structure must be complemented with the auxil- 
iary sequence, of "E,II,T" symbols assigning the appro- 
priate set of coefficients. The decoupling between pri- 
mary and dihedral sequence, not present in similar mod- 
els (40,41), increases the possibilities in the modulations 
of relative strengths between local and non-local interac- 
tions which results in a finer structural tuning ,1L | . The 
Head-Gordon force field is completed by a bond angle 
interaction modelled as a harmonic potential 



N-2 



(6) 



with a constant kg — 20eh/ {radY so that large devi- 
ations from the equilibrium value 6^ = 1.8326 rad are 
unlikely, and bond angles result basically fixed. Also 
in this model, stiff springs (O with equilibrium dis- 
tance ro — 3.8 A, maintain the chain connectivity mim- 
icking the presence of covalent peptide bonds between 
successive aminoacids. This stiff interaction allows to 
keep the bond length approximately fixed, while being 
less computationally demanding than the RATTLE al- 
gorithm used in previous works |3 US to enforce fixed 
bond lengths. The SHG model retains only the min- 
imal number of elements needed to capture the essen- 
tial features of protein molecules, however, some strong 
determinants such as hydrogen bonding and side chains 
are missing. These limitations should be compensated 
through a design strategy |^ for optimising the se- 
quence. Here, we used the sequence LBBNN-BLBLB- 
NLNNN-LBBBB- LLNNL-BNBBL-LBNNL proposed in 
Ref. TTl for the hPinl WW domain and designed via 
a threading approach based on energy gap maximiza- 
tion (42). The secondary structure propensity, select- 
ing the native- like dihedral angles, is encoded in the 
auxihary sequence TTTTT-EEEEE- TTTTT-TEEEE- 
TTTTT-EETTT-EET, buih directly through the infor- 
mation contained in the PDB file INMB.pdb. In order 
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to control the temperature, we performed constant tem- 
perature MD simulations within the isokinetic scheme 
[88| using dimensionless quantities. The temperature was 
measured in units of e/i/i? — 1070.96 K, time in units of 
t = a[i.hlMfl'^^ 4.44 ps, (a = 4.0 A is the equilib- 
rium length of Lcnnard- Jones interactions, M ~ 110 is 
the average aminoacid mass) , energy in units e/i , specific 
heat in units R = 1.9872 x 10~^ Kcal mol^^K^^ and the 
radius of gyration in units a. The energy scale was 
set to 1.65 Kcal mol~^ to reach a denaturation temper- 
ature compatible with experimental data @ T = 332 K. 
For the Go-model the same units apply, except for the 
energy scale set to e = 0.66 Kcal mol~^. During the 
simulations, we monitored the difference from the native 
state or reference state through the overlap Q, represent- 
ing the fraction of formed native contacts 



j>i+3 



where, the sum runs over all pairs of native contacts, 
r,j and Rij are the distances of residues i and j in the 
current and in the reference conformation respectively 
and 0(u) indicates the unitary step function. A value 
Q = 1, indicates that the conformation is native-like, 
while values close to zero refer to denaturated states, we 
also considered, as further reaction coordinates of the 
folding/unfolding process, the gyration radius and the 
root mean square distance (RMSD) between the cur- 
rent and reference conformations after an optimal su- 
perposition performed according to Kabsch's algorithm 
[221 • The thermodynamics of the folding/unfolding tran- 
sition was obtained via the weighted histogram method 
[iol l4ll | . This technique offers the possibility to gain a 
better sampling of the conformation space than ordinary 
methods. The procedure consists in storing bidimen- 
sional histograms of the number of contacts -/V(iJ, Q) as 
a function of the energy E and coordinate Q at each tem- 
perature run. Such histograms are then optimally com- 
bined to reconstruct the best estimate of the density of 
states i}{E,Q), which, in turn, will be used to compute 
the thermodynamics of the system. The knowledge of 
^{E, Q) can be also employed to derive the probability 
that, at temperature T, the protein states are character- 
ized by energy E and reaction coordinate Q 

Pt{E, Q) = n{E, Q) exp{-/3(£; - F)} 

where /3 = 1/RT and F is the total free energy of the 
system coming from the normalization of Pt{Q, E). The 
sum of Pt{E, Q) over all possible energies E provides the 
probability for the system to have a specific value Q at 
temperature T, which in turn, by reversing the Boltz- 
mann's weight, gives the potential of mean force along 
the reaction coordinate Q 

WriQ) = -RT\n[PT{Q)] . 

We computed the specific heat profile as a function of 
the temperature: its peaks and shoulders locate those 



temperatures at which the main structural chain rear- 
rangements occur A detailed characterization of the fold- 
ing/unfolding process can be obtained by measuring the 
probability of native contact formation as the tempera- 
ture is varied 



where the average (• • • ) is taken over time assuming the 
dynamics to be ergodic. Pij{T) typically features a sig- 
moidal shape, keeping values close to 1 at low tempera- 
tures and decreasing to zero at high temperatures. The 
knowledge of probabilities Pij (T) allows for a classifica- 
tion and ranking of native contacts according to their 
" thermodynamic relevance" [i^, l43l li^ thus suggesting 
possible reaction pathway, key residues and folding 
nucleus lia. 



A. <I> values 

The comparison of Go and the SHG models on the 
WW domain provide the opportunity to study the rel- 
evance of topological versus energetic frustration |4^ in 
the folding mechanism. This can be accomplished by <I> 
values computation and by the further comparison with 
experimental data. $-values 45j measure the perturba- 
tion effects of a site-directed mutation which, by alter- 
ing the free energy difference among native, transition 
and unfolded states may affect the thermodynamics and 
the kinetics of the reaction. A prevalence of topological 
or energetic frustration may be argued from a better fit 
with the experiments of the Go-derived or SHG-derived 
$- values respectively [2^ . The $-vahies can be com- 
puted through a kinetic approach from the folding and 
unfolding rates of the mutant and wild- type protein |48| : 



$ = 



RT\og{kY^/kf"*) 



i?riog[(A:f ^/fc].""*) • (fc^V^r^)] 



(7) 



where R is the ideal gas constant, T is the absolute tem- 
perature and kf and ku are the folding and unfolding 
rates respectively. The denominator of the above expres- 
sion is just the total stability change A AG". The use of 
equation [7| is computationally demanding as it requires 
a simulation for each mutation. This motivates the use 
of a thermodynamic strategy for the <I>-value evaluation 



AAGt AAG^^ - AAG^ 



AAGO AAG^ - AAG^ 



(8) 



where AAG^ is the change in stability of the free-energy 
barrier between the native and denaturated state. Equa- 
tion |S1 is equivalent to equation [7| when Kramer- like the- 
ory applies 47,] . 

If the effect of the mutations is sufficiently small, then, 
following Ref. the $-values can be derived by a free 
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energy perturbation (FEP) approach: 

^ ^ log{exp{-AE /RT})ts - logjeM- ^E/RT})u 

log{eiip{- AE / RT}) F - log{exp{-AE/RT})u 

(9) 

where the Boltzmann factors depend on the energy dif- 
ference between the mutant and the wild type (WT) and 
the averages are computed over WT-conformations of the 
folded (F), transition state (TS) and unfolded (U) ensem- 
bles. 

In the present paper, the <l>-values are computed ac- 
cording to equation El using a method developed in Ref. 
[2^ that can be summarized in the following steps: 

1. Determination of the folding temperature Tf from 
the specific heat plot. 

2. Analysis of the free energy profile at temperature 
Tf plotted as a function of a suitable reaction coor- 
dinate. The free energy profile of a two-state folder 
typically shows a double-well shape which allows 
to choose three windows of the reaction coordinate 
identifying the folded, transition state and unfolded 
ensembles respectively. 

3. Dynamic simulation at T ^ Tf and storage of con- 
formations belonging to the F, TS and U ensembles. 

4. Choice of mutations and computation of FEP $- 
values. 



Go-model 

A folding simulation was performed through a gradual 
cooling of a random coil structure from a temperature 
T = 1.5 down to 0.5 in forty steps. The specific heat 
profile. Fig. |2 is characterized by a single narrow peak 
at temperature Tf = 1.0 suggesting a possible two state 
process. 




^.5 1.0 1.5 

T 

FIG. 2: Heat capacity as a function of temperature in Go- 
model simulations: folding (solid), unfolding (dotted). Inset: 
thermal behaviour of energy; dotted lines represent quadratic 
fits of the baselines. 



Structural information about the native-likeness of the 
transition state was also gained from the so-called struc- 
tural $-values I49II: 



_1 Ejec(i) Prsihj) 

N. 



(10) 



where Pp{i,j) and PTsihj) are the frequencies of the 
native contact i—j in the folded and transition ensembles 
respectively, and the sums run over the set C{i) of native 
contacts in which residue i is involved. 



III. RESULTS 

We report on the thermodynamic properties and con- 
tact formation patterns observed in unfolding/refolding 
equilibrium MD simulations of the WW domain. We 
first analyze the simulations based on Go-model and then 
we discuss the corresponding scenario in the SHG-model 
approach. Since the implementation of the SHG model 
requires a well designed sequence we employed the 6-40 
truncated sequence already optimized in Ref. For 
the sake of a consistent comparison with Go-simulations, 
the corresponding fragment was extracted from the NMR 
structure stored in the PDB file INMV 01. 



The same conclusion can be drawn from the ratio 
of van't Hoff over the calorimetric enthalpy changes 
amounting to 0.74 without and 0.99 with standard base- 
line subtraction "5^. 

The folding/unfolding processes are reversible in tem- 
perature as shown by the agreement between specific 
heat plots. The other observables used to characterize 
the folding transition such as, RMSD, overlap and gyra- 
tion radius exhibit an abrupt change in correspondence 
to the folding temperature Tf (Fig. |3J). Free energy pro- 
file (Fig^Jl as a function of the overlap, around the folding 
temperature, clearly features two distinct wells identify- 
ing the folded and unfolded ensembles separated by a 
barrier corresponding to the transition state conforma- 
tions. 

The shape of the free-energy plot suggests a choice of 
overlap windows for the sampling of conformations in the 
three ensembles F, U and TS (see caption of Fig^ for 
the computation of <&- values (Methods). In figure Owe 
compare our single site simulated ^-values (Eq. ^ with 
the experimental data by Gruebele 0. In the Go-like 
approach, a mutation can be modeled as the removal of 
a single native contact |22j| or in alternative, as an aver- 
age over all possible removals of contacts involving the 
same residue. We followed the second strategy consider- 
ing only contacts |j — i| > 3. In this scheme we cannot 
evaluate <I>- value of SerlS because it lacks such contacts. 
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1.50 



FIG. 3: Structural parameters monitored during the Go- 
model folding simulations. Triangles are the reduced gyration 
radius; filled circles indicate reduced RMSD; open circles re- 
fer to the fraction of native contacts Q magnified by a factor 
4. Each point in the plots corresponds to an average of 6 x 10^ 
conformations sampled every 10^ time-steps. RMSD and Q 
are computed using the PDB structure as a reference. 
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FIG. 4: Free energy profiles at different temperatures as a 
function of the overlap Q (Lower panel) . Vertical lines indi- 
cate the the boundaries of the sampling windows for F (0.85 < 
Q < 1.00) U (0.20 < Q < 0.35) and TS (0.60 < Q < 0.65) 
see text. Upper panel: time evolution of Q at the folding 
temperature T — 1 oscillating between the minima of the free 
energy wells. 



The theoretical $-values in figure [S] vary in the range 
[0.0,0.5], whereas the experimental ones are distributed 
in a much wider interval. This feature is an expected 
result of the very limited energetic frustration of the Go- 
force fields ■ 

The discrepancy is reflected by the modest value of the 



FIG. 5: a) Comparison between experimental (solid circles) 
and theoretical (open circles) "^-values restricted to the mu- 
tations performed in Ref. 0] except for Serl8. "^-values were 
computed from the conformations sampled in a Go simula- 
tion at folding temperature using the perturbation method 
The inset shows the linear regression analysis between 
the two data sets with a correlation coefficient 0.54. b) En- 
largement of theoretical $- value shown in panel a) . c) Struc- 
tural <E>-values computed from formula (IIOII using the (F) and 
(TS) ensemble structures. The three peaks in the plot indi- 
cate that the two loops and the first hydrophobic cluster are 
native-like in the transition state. 



linear correlation coefficient r = 0.54, (see regression line 



7 



in the figure [SJi inset). Of course we cannot exclude that 
a possible improvement of $-value accuracy might be 
achieved either by employing other mutation implemen- 
tations or by using alternative contact maps accounting 
for the high flexibility of the native structure of peptides 
and small proteins |5ll | . Despite this not high correlation, 
the theoretical ^-values provide a qualitative indication 
about the molecule regions that are still native- like in the 
transition state. The plot in fact indicates that the sites 
most sensitive to mutations are those in the region of 
loops LI and L2 in agreement with experimental results 
(see Figure EId). 

The picture provided by the structural $-values 
(Fig. ISJ;) is consistent with that derived from the per- 
turbation method, in fact also in this case the highest $- 
values correspond to residues located in LI (Ser 19), L2 
(Thr 29) or in the neighborhood of the first hydrophobic 
cluster CLl (Pro 8). The low $-values pertain mainly 
to residues in strands /3i and /?2 suggesting that these 
two regions are unlikely to be in contact in the transition 
state. 
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FIG. 6: Thermal behaviour of heat capacity (main figure) 
and energy (inset) during unfolding (solid lines) and fold- 
ing (dashed lines) SHG simulations. The thermodynamic ob- 
servables have been computed using the weighted histogram 
method. 



A. SHG-model 

Ten independent folding simulations starting from 
random-coil conformations, were performed through a 
gradual cooling schedule from temperature T = 1.0 to 
T = 0.01 in 40 steps. The final structures were further 
relaxed by a steepest descent cycle until the maximal to- 
tal force per monomer reached a value smaller than 10^^ 
Kcal mol~^A~^. We obtained different folds and chose 
the conformation with lowest energy {E = — 19.0035e) 
and lowest RMSD (4.74 A) from the PDB structure as 
the reference structure Fo (Fig. However, the simu- 
lations revealed also the existence of another degenerate 
minimum with the same energy and specular to Fo re- 
sulting in much higher RMSD. 

Despite the large value of RMSD, Fq correctly displays 
the topology of a triple-stranded, antiparallel /3-sheet, 
that however lacks the typical twist of the PDB struc- 
ture making loop L2 almost perpendicular to loop LI 
(see Figure^. As a result, the folded structure is much 
more compact than the real protein and has a much larger 
number of native contacts (71 versus 41). The fact that 
22 out of the 41 PDB contacts are also present in the 
folded structure is an indication of the satisfactory struc- 
tural performance of the SHG simulation. 

Structure Fq was then denaturated through ten in- 
dependent runs with the same but inverse temperature 
schedule, involving a thermalization stage of 6 x 10^ time 
steps {At — 0.005) at each temperature followed by a 
run over the same length, where control parameters were 
measured to assess the unfolding progress. The course of 
both folding and unfolding simulations was monitored 
through the analysis of the energy, the specific heat, 
RMSD from Fq, the overlap and radius of gyration Rg. 

Both the folding and unfolding specific heat plots 



(Fig. are characterized by the presence of a main peak 
and a shoulder. The peaks of the folding and unfolding 
thermograms Pf and P„ are located at Tf = 0.36 and 
Tu = 0.33 respectively whereas the shoulders Sf and Su 
correspond to Tsf = 0.24 and Tsu = 0.28. The folding 
process appears not to be fully reversible probably due 
to the fact that the sequence, although designed, is not 
yet a good folder. 

The existence of the shoulder in the folding C„ plot 
is a signature of a non cooperative folding mechanism 
in which, an initial collapse is followed by a structural 
chain rearrangement characterized by a significant in- 
crease in the number of native contacts unaffecting the 
overall compactness of the molecule (see Fig. [T)). 

This is confirmed by the thermal fluctuation of the 
structural overlap 

varQ(r) = (Q2) _ (g)2 

featuring the highest peak, not in correspondence of the 
main peak Pf but at the temperature Ts / of the shoulder 
(inset of Fig.d). 

The marked difference between the folding and unfold- 
ing specific heat suggests the opportunity to consider the 
free energy profiles Wt{Q) to better determine the fold- 
ing temperature. The profiles (lower panel of Fig. |HJ 

indicate that the transition is characterized by the 
presence of two wells separated by a barrier and the tem- 
perature where these two wells are evenly populated is 
T = 0.237. This confirms that, the peak of Cy is mainly 
related to the 0-collapse, whereas the shoulder corre- 
sponds to the folding transition. In fact, kinetic simula- 
tions at temperature T = 0.237 show that the time evo- 
lution of Q{t) exhibit jumps between the two free energy 
wells (upper panel of Fig|HJl . The double- well shape of the 
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FIG. 7: Noncooperativity of the SHG folding simulation yield- 
ing our best final structure Fq. After a first collapse the radius 
of gyration remains constant whereas the structural difi'er- 
ence from the reference structure 1 — Q keeps on decreasing 
at temperatures corresponding to the shoulder of the Cv plot 
signalling a massive structural rearrangement. Inset: tem- 
perature dependence of fluctuations of the structural overlap. 
The main peak, located at the same temperature as the heat 
capacity shoulder, corresponds to the folding temperature. 

time 
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FIG. 8: Free energy profile as a function of the overlap Q at 
the folding temperature T = 0.237 (lower panel). The vertical 
lines mark the boundaries of the Folded (0.78 < Q < 0.90), 
Transition State (0.53 <Q < 0.61) and Unfolded (0.30 <Q < 
0.47) ensembles. The upper panel shows the typical temporal 
evolution of the overlap in a sub-interval of a simulation at 
folding temperature. The overlap was sampled every 5 x 10"^ 
time-steps. 



free-energy profile again allows to sample conformations 
in the folded (F), transition state (TS) and unfolded (U) 
ensemble used to implement the perturbation technique 
for value computation (Methods). The plot in figure |51 



shows the ^-values restricted to the set of residues mu- 
tated by Gruebele [3- 



I ' ' ' ' I ' ' ' ' I ' ' ' i I ' ' ^ ' 
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FIG. 9: Experimental (solid circles) and computed (open cir- 
cles) <I>- values. $- values were computed from the conforma- 
tions sampled in a SHG simulation at folding temperature 
using the perturbation method. The two profiles show a qual- 
itative agreement although the correlation coefficient of the 
regression line (see inset) is r — 0.65. 

For each site we tested the effect of all the possible 
single mutations allowed by the model, namely two hy- 
drophobicity shifts and two shifts in the secondary struc- 
tural bias for each of the two dihedral angles flanking 
the residue under examination. For each site we chose 
the least perturbative mutations. Theoretical ^-values 
feature two major peaks in correspondence with loop LI 
and loop L2 which is a qualitative resemblance with ex- 
periments. A more quantitative comparison is provided 
by the correlation coefficient between theoretical and ex- 
perimental data amounting to r = 0.65. 

The set of native conformations collected during the ki- 
netic simulation provides a structural characterization of 
the ensemble F whose most interesting feature is the clus- 
tering of native-basin conformations in two main subsets 
characterized by non overlapping distributions of RMSD 
from the reference structure Fq. This is a further indi- 
cation of the high level of frustration of the free-energy 
landscape associated to the sequence and it is in agree- 
ment with the findings by Miller and Wales |52] about the 
glass-like structure of the energy landscape in a closely 
related model 'ss'l . This partitioning of the native basin 
is evident in figure ITUl 

where we plot the free energy versus the RMSD which 
is a structural indicator more sensitive than the over- 
lap. A finer analysis reveals that the structures in the 
two sub-basins of the native valley correspond to differ- 
ent chiralities but similar energies. The absence of such 
a partitioning in the same plot for the Go force field, in- 
dicates that this feature is mainly peculiar of the model 
rather than this specific protein. 
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FIG. 10: Upper panel: low-temperature free energy profiles of 
the SHG model as a function of the RMSD from the reference 
conformation To . The native valley appears to be partitioned 
in two main sub-basins separated by a barrier. The sub-basin 
corresponding to the RMSD range [0.25—1.00] is populated by 
conformations with the same chirality as the PDB structure, 
whereas the sub- valley in the range [2.80 — 6.50], corresponds 
to the opposite chirahty. Lower panel: low-temperature free 
energy profiles of the Go model as a function of the RMSD 
from the native conformation (pdb-id = INMV). The native 
valley shows a single basin as opposed to the partitioning in 
two sub- valleys typical of the Go model. 



To clarify how the landscape properties affect the re- 
versibility of the folding process we studied the fold- 
ing/unfolding transition from the contact formation 
probabilities P^. In particular, as the plots Pij(T) are 
typically sigmoid, a contact can be regarded as bro- 
ken/formed in correspondence of the temperature, where 
the absolute value of the slope of the probability curve 
is maximal. This allows to identify the contacts whose 
formation/breakdown occurs at a given temperature. 

To analyze the folding/unfolding process, we consid- 
ered three temperature windows corresponding to differ- 
ent regions of the specific heat plots. The first window 
(T < 0.15) refers to the pre-transition baseline of the 
plot, the second window (0.15 <T< 0.30) insists on the 
region of the shoulder, and the third window (T > 0.30) 
includes the main peak. The contacts appearing or dis- 
appearing in correspondence of the three windows are 
shown in black, red and green respectively, in the con- 
tact maps (Fig. Ill|l summarizing the main events of the 
pathway. Shaded symbols represent weak interactions 
with probability of formation below 50% at the lowest 
simulation temperature T — 0.01. 

The contact map shows that the first contacts formed 
during folding are located in the intermediate part of 
sheet /3i-/32 and in the region of sheet /32-/33 most distant 
from loop 2. The formation of these contacts is respon- 
sible for the collapse of the molecule into a compact but 
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FIG. 11: Gontact maps summarizing the folding and unfold- 
ing SHG process. The color code identifies three tempera- 
ture ranges: black, T < 0.15; red, 0.15 < T < 0.30; green, 
T > 0.30. Shaded symbols refer to weak contacts with low 
probability of formation (Pij < 0.5) at the lowest simulation 
temperature T = 0.01. 



not completely folded conformation. The map also shows 
that the shoulder of the Cy plot is characterized by the 
zipping of sheets /3i-/32 and ^2-^3 towards loop 1 and 
loop 2 respectively. During the process there also oc- 
curs the locking of P1-P3, (32-tail and head- [32 contacts 
(hereafter, the term 'head' and 'tail' indicate the amino- 
terminal region Lys6 — GlylO and the carboxy-terminal 
residues Asn36 — AsnAO respectively). These contacts 
are not present in the PDB structure and they arise as 
a consequence of the higher compactness of Fq, so that 
their formation probabilities are always below 50%. The 
folding is completed by the appearence of a few contacts 
between residues very far from each other along the pro- 
tein chain. 

During the unfolding reaction no "native" (with re- 
spect to Fq) contact breaks down in the low tempera- 
ture window because the heating schedule enables the 
protein to escape easily from kinetic traps making the 
process much less gradual than folding. This reflects on 
the smaller number of contacts broken in the shoulder 
region as compared to the number of contacts formed in 
the same temperature range during folding. In partic- 
ular, the cleavage occurs of Pi-Ps, (32-tail and head-(32 
contacts, whereas the dissolution of the contacts of loop 
1 and loop 2 is delayed to the region of the peak of the 
plot where most f3i-f32 and (32-(33 contacts also disappear. 

The comparison of the two contact maps thus reveals 
that the sequences of the molecular events in the fold- 



10 



ing and unfolding processes are basically reverse to each 
other even if the unfolding is a more abrupt phenomenon 
occurring in a narrower temperature window. 

IV. DISCUSSION 

Our results indicate how the different approaches of 
structure-based and sequence-based description exempli- 
fied by Go and SHG model are appropriate to simulate 
complementary features of the folding process. The Go 
model, in fact, being based on the influence of the native 
state topology on the folding process is independent from 
the amino-acid sequence and it completely disregards the 
chemical properties of the molecule. The SHG model, on 
the other hand, is a minimal model, where the chemical 
features of amino-acids are partially included determin- 
ing the folding driving force. 

Our simulations showed that the Go model with angu- 
lar bias and rescaling [s^ can correctly reproduce the 
reversible, cooperative, two-state mechanism of folding of 
hPinl WW domain |5|. The reversibility, indeed appears 
from the almost perfect superposition of the Cy plots 
of folding and unfolding. Several elements, on the other 
hand, suggest a cooperative, two-state mechanism: the 
Cv plots show a single sharp peak, the ratio of the van't 
Hoff to calorimetric enthalpy is close to 1 (kj*^ = 0.99) 
[53.] , all the indicators used to monitor the similarity with 
the native state exhibit a sharp sigmoidal thermal be- 
haviour and the barrier between the two frcc-cncrgy wells 
at the folding temperature is very high 54, SSj. The 
results from the simulation using the SHG model were 
rather ambiguous. The simulated thermograms featured 
not only a peak, but also a shoulder at lower tempera- 
ture. This is the signature of a non cooperative folding 
involving a collapse into a compact, only partially struc- 
tured globule, followed by a rearrangement into a native 
conformation. This scenario is confirmed by the thermal 
behaviour of the structural parameters (overlap, gyration 
radius and RMSD from native structure) used to monitor 
the folding reaction. The results are consistent with the 
findings by Nymeyer et al. ,5Q,] and by Guo and Brooks 
HI j4j, in their simulations on the model by Honeycutt 
and Thirumalai [s^l- The SHG formulation, although be- 
ing an improvement of the latter model, still retains some 
of its drawbacks. Indeed, the conformations of the native 
ensemble (F), sampled at the folding temperature, can be 
clustered in two groups with non-overlapping RMSD dis- 
tributions and opposite chiralities '57]. The existence of 
two distinct clusters of native-like conformations can be 
easily explained by examining the low-temperature free- 
energy profiles as a function of the RMSD (from reference 
structure Fq): the native basin appears to be partitioned 
in sub-basins separated by barriers. The partitioning of 
the native basin is likely a feature that the SHG model 
inherited from Thirumalai model. Miller and Wales , 
in fact, analyzed the disconnectivity graph of the poten- 
tial energy surface of Thirumalai's force field drawing the 



conclusion that the energy hypersurface is not a single 
funnel, but it contains low-energy minima separated by 
high barriers. 

Presumably, the reason for the degeneration of the na- 
tive state of the SHG model relies in the symmetry of the 
dihedral potential V^, Eq. In particular, the sequence 
designed to represent the hPinl WW domain, contains 
only "Extended" or "Turn" symbols, so that is a poly- 
nomial in cos(0) and becomes symmetric for the inver- 
sion (j) —^ —(j). The symmetry of the term, however, 
is not the only reason for the poor performance of the 
SHG model. In fact, we find that the energy histograms 
of the Folded- and Unfolded- state ensembles are signifi- 
cantly overlapping thus suggesting the existence of many 
low-energy, non- native conformations |4ll |. 

We suspect that this is an effect of the only approxi- 
mated maximization of the energy gap between the na- 
tive conformation and the decoy set used in the sequence 
optimization procedure |11| . This would call for further 
refinements of the threading procedure. 

Despite the several drawbacks, the SHG model enabled 
the computation of perturbation values in qualitative 
agreement with experimental data. The linear correla- 
tion coefficient between theoretical and experimental <&- 
values (r = 0.65) is actually better than the one yielded 
by the Go simulation (r = 0.54). The explanation of 
these result must be sought in the partial incorporation 
of the chemistry in the SHG description. Indeed, real mu- 
tations are chemical transformations of the molecule and 
they are better simulated by a chemically-based model 
such as the SHG rather than by a topological model. 
In the Go model, in fact, mutations are generally sim- 
ulated by the removal of native contacts i22] ■ however, 
they may affect all the interactions a residue is involved 
in. The SHG model, conversely, offers the possibility to 
treat mutations in a more realistic way because it im- 
plements shifts in the hydrophobic character of residues 
or changes in the secondary structural bias of dihedral 
angles. Moreover, the better agreement of experimental 
data with the SHG-computed <I>-values may show that 
the folding mechanism of hPinl WW domain is controlled 
not only by topological but also by energetic factors. 

The significant differences, beyond statistical errors, 
between the $-values profiles yielded by the two models, 
in our opinion, reflect the different strategies upon which 
the two models are built. 

A final issue that deserves some discussion is the qual- 
ity of the structural prediction using the SHG model. 
The SHG is a minimal model based on chemical proper- 
ties of the system and a good outcome of the simulation 
is not a priori guaranteed. The simulations show that, 
apart from chirality problems, the best final structure Fg 
(Fig. ^1 presents the correct topology of a three-stranded 
antiparallel /3-sheet of the hPinl WW domain, even if 
the structure appears to be more compact. This how- 
ever, does not prevent the correct formation of both hy- 
drophobic clusters. Moreover, Fq shares 22 of the 41 
native contacts of the PDB structure. 
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V. CONCLUSIONS 

We performed folding and unfolding simulations of the 

WW domain of liPinl protein that represents an excel- 
lent candidate to test folding algorithms and models due 
to the availability of a large amount of structural, ther- 
modynamical and kinetic experimental data. The pur- 
pose of the work was to compare the performance of the 
Go and SHG models that represent two different strate- 
gies to the folding problem. Our simulations indicated 
that for the specific WW domain considered in this work, 
the Go model with angular bias and rcscaling, correctly 
reproduces the cooperative, two-state, reversible folding 
mechanism, whereas the SHG model does not. The rea- 



sons for the limitations of the SHG model must be sought 
in the insufficient optimization the sequence and in the 
non-funnel shape of the landscape. As a consequence, the 
present version of the SHG model does not allow reliable 
predictions of the folding mechanism. The satisfactory 
performance of the SHG model in the computation of 
<I>- values, however, clearly shows the importance of in- 
corporating the chemical properties of the sequence in a 
protein model. Our work, highlighting the limits of the 
SHG model, is thus intended to be a starting point for 
a further refinement of the model, in the firm belief that 
coarse-grained, minimal models represent viable alterna- 
tives to computationally demanding all-atom simulations 
in investigations of large-sized, slow-folding proteins. 
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